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The transcription and replication of the severe acute respiratory syndrome (SARS) coronavirus 
(SARS-CoV) is regulated by specific viral genome sequences within 5’- and 3’-untranslated regions 
(5’-UTR and 3’-UTR). Here we report the solution structure of 5’-UTR derived stem-loop 2 (SL2) of 
SARS-CoV determined by NMR spectroscopy. The highly conserved pentaloop of SL2 is stacked on 


5-bp stem and adopts a canonical CUYG tetraloop fold with the 3’ nucleotide (U51) flipped out of 
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the stack. The significance of this structure in the context of a previous mutagenesis analysis of 


SL2 function in replication of the related group 2 coronavirus, mouse hepatitis virus, is discussed. 
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1. Introduction 


Severe acute respiratory syndrome (SARS) is a disease caused by 
the SARS-associated coronavirus (SARS-CoV) comprised of a single- 
stranded, positive-sense RNA genome of ~30 kb in length. For all 
CoVs the 5’ two-thirds of the genome encode non-structural pro- 
teins involved in proteolytic processing of the gene1 polyprotein, 
virus genome replication and subgenomic RNA (sgRNA) synthesis, 
and the 3’ one-third of the genome encodes structural and acces- 
sory proteins (Fig. 1A). 

Coronaviruses express seven to nine sgRNAs during replication, 
each containing a common 5’ leader sequence and 3’-untranslated 
region (UTR) that harbor important structural elements involved in 
replication and/or translation [1-5]. Although the mechanism of 
CoV transcription and replication remains poorly understood, dis- 
continuous transcription during minus-strand synthesis is the cur- 
rently accepted model. A nested set of subgenome-sized co- 
terminal negative-sense RNAs are transcribed from positive-sense 
genomic RNA by the viral transcriptase/replicase complex (TRC), 
which then serve as templates for subgenomic MRNA (sg mRNA) 
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synthesis. The 3’- end of the ~70-nt leader within the 5’-UTR con- 
tains a short (6- to 8-nucleotides) sequence, the transcriptional 
regulatory sequence (TRS-L), which also is present in the genome 
just 5’ to each structural gene (TRS-B) [6]. Molecular genetic stud- 
ies are consistent with a leader-body joining model which the 
complement to TRS-B on newly synthesized minus strands base- 
pairs with TRS-L to regulate the synthesis of sgRNAs by template 
switching [7-10]. 

Secondary structural models predict that the 5’ region of the 5’- 
UTR folds into three major stem-loops, SL1, SL2, and SL4b [11,12]. 
SL3, which harbors the TRS-L (5’-CUCAAAC) is only predicted to be 
stable at 37 °C for OC43 and SARS-CoV [11] (Fig. 1B). Mutations in 
the helical stem of SL1 or the loop of SL2 have pronounced effects 
on mouse hepatitis virus (MHV) replication, largely manifest as a 
defect in sgRNA transcription [5,11,13]. Although the sequences 
and predicted secondary structures of MHV and SARS-CoV 5’ UTRs 
are significantly different, the SARS-CoV SL1, SL2, and SL4 can func- 
tionally replace their MHV counterparts in the MHV genome and 
produce viable chimeric viruses [14]. 

Excepting the TRS, SL2 is the most highly conserved sequences 
in the 5'-UTRs of CoVs [11] and is characterized pentaloop (C47- 
U48-U49-G50-U51 in SARS-CoV) stacked on a 5-bp stem (Fig. 1C 
and D), with some CoV sequences containing an additional U 3’ 
to U51 [11]. Here we report the structure of SL2 of SARS-CoV 
determined by NMR spectroscopy. SL2 adopts a tetraloop fold 
stacked on a helical stem. Tetraloops have been grouped by their 
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Fig. 1. Coronavirus genome and SL2 secondary structure. (A) ORFs of coronavirus. Gene 1 (1a and 1b), hemagglutinin-esterase (HE), spike glycoprotein (S), accessory proteins 
(AP), membrane protein (M), nucleocapsid (N), and 5’- and 3’-UTR. (B) Predicted secondary structure of 5’-UTR of SARS-CoV. Stem-loops (SL1, SL2, SL3, and SL4) are indicated. 
(C) Primary sequences of the SL2 loop region from five coronaviruses. Stem regions are underlined. (D) Schematic representation of the sequence conservation of SL2 of all 


coronaviruses. 


sequence and conserved structures into five types: (i) GNRA, (ii) 
UNGG, (ili) ANYA, (iv) (U/A)GNN, (v) CUYG. Recently, they have 
been further subclassified according to specific deviations from 
the standard tetraloop motif, e.g., a 3-2 switch, deletion, insertion, 
and strand clips [15]. SL2 adopts the CUYG-like, insertion-type tet- 
raloop structure which features a C47-G50 Watson-Crick (WC) 
base pair with the conserved 3’ nucleotide, U51 flipped out of 
the stack. 


2. Materials and methods 
2.1. Sample preparation 


Unlabeled and '°C, !°N-[U]-labeled wild-type (WT) RNA were 
prepared as described previously [5]. For NMR, SL2 was dissolved 
in 10mM potassium phosphate, pH 6.0 in 10% D20/90% H20 or 
100% D20. All RNAs were monomeric under these conditions veri- 
fied non-denaturing polyacrylamide gel electrophoresis. 


2.2. NMR spectroscopy 


NMR experiments were acquired on a Varian Inova 500 or 
600 MHz spectrometer at 283 and 298 K [5]. NMR data were pro- 
cesses and analyzed with NMRPipe [16], Sparky [17] and NURView 
[18]. Several mixing times (T,, = 60, 250, and 280 ms) in 2D-NOESY 
experiments were tested to confirm the absence of significant spin 
diffusion. A 2D 'H-NOESY spectrum (t,, = 200 ms) in 10% D20/90% 
HO was acquired to obtain imino proton resonance assignments, 
while 2D 'H-NOESY (tm = 250 ms) and 2D 'H-TOCSY experiments 
in D2O were performed to obtain non-exchangeable proton reso- 
nance assignments and NOE restraints using standard methodolo- 
gies [19]. 


2.3. Structure calculation and analysis 


NOE peak assignment and initial NOE constraints were obtained 
with CYANA [20] and CANDID [21]. All NOE constraints were man- 
ually confirmed during the CYANA calculations. Hydrogen bonding 
constraints were introduced for all base pairs and artificial torsion 
angle restraints derived from the high-resolution crystal structures 
of A-form double-helical RNA were used to impose better conver- 
gence of the ensemble [22]. 

The initial 100 structures were calculated by a simulated 
annealing protocol with Xplor-NIH [23] and were further refined 
using a conformational database potential [24] and planarity 
restraints for the helical stem region. Iterative refinement and 
editing of the distance restraints based on the NOESY spectra to 
remove incorrect and ambiguous assignments reduced the number 


of restraints. Force constants were 0.2-30 kcal mol! A~? for NOE 
restraints and 10-100 kcal mol-'rad~? for dihedral angle re- 
straints in the refinement calculations. The final 27 structures with 
the lowest energy were chosen for analysis using the programs 
Xplor-NIH and 3DNA [25] and are deposited in the PDB (accession 
code 2L6I). NOEs in the loop region (U46-A52) of the SL2 RNA were 
confirmed by back-calculation of the NOE intensity using Xplor- 
NIH (see Table S1 and Fig. S1). Chemical shifts of the SL2 RNA are 
deposited in the BMRB (accession code 17309). Figures were pre- 
pared using the program PyMOL [26]. 


3. Results and discussion 
3.1. Solution structure of SL2 


Coronavirus SL2 used in this study is SARS-CoV SL2 containing a 
conserved 5’-CUUGU pentaloop, which differs from the MHV SL2 
only in the identity of two of the five bp in the stem (Fig. 1C). 
The SL2 construct used for NMR contains a non-native 3’ A to sta- 
bilize the base of the stem. In the initial CYANA-derived structures, 
C47 was found to stack on U46 with G50 stacked on the A52 
(Fig. 2A) and U51 flipped out from the stem (see also [5]). G50 
adopted a high-anti glycosidic bond angle. These structural charac- 
teristics are found in the CUGG tetraloop structure containing a 
base pair between C; (C47) and Gj+3 (G50) [22], a finding also con- 
sistent with the recovery of second-site C47A-G50U MHV viruses 
from G50U MHV stocks after multiple passages [5]. We therefore 
added hydrogen bonding constraints between C47 and G50 in 
the final refinement step, although the imino proton associated 
with this base pair could not be detected experimentally. 

The NMR structure of SL2 is fully consistent with our previous 
studies of SL2 [5]. The bundle of structures is well converged with 
0.47 A RMSD for all heavy atoms (Table 1). The stem adopts an A- 
form helix containing five WC base pairs with the 3’ terminal 
nucleotide A57 disordered (Fig. 2B). The pentaloop is quite well de- 
fined and stabilized by base pairing and intra- and inter-nucleotide 
interactions (Fig. 2C). U48 base lies in the minor groove of the 
stem, with the orientation of this base not fully converged 
(Fig. 2B) but likely stabilized by hydrophobic contacts between 
H5 and H6 edge of the U48 base and the sugar ring of C47 
(Fig. 3A). U49 stacks on C47 in the C47-G50 base pair and thus caps 
the helical stem and the O2 of U49 and H42 proton of C47 are in 
close proximity (Fig. 3A). U48, U49 and U51 each adopt a C2’-endo 
ribose conformation in the SL2 structure, consistent with the 
strong H1’-H2’ cross peaks in an 'H-'H TOCSY spectrum which re- 
ports on large 7/(H1’,H2’) vicinal coupling (Table S2 and Fig. $2). In 
contrast, C47 and G50 adopt at least some C3’-endo ribose pucker 
consistent with their weaker H1’-H2’ cross peaks (Fig. S2) as might 
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Fig. 2. NOEs and solution structure of SL2. (A) Key NOEs in loop region observed in a 'H-'H NOESY spectrum acquired in D,O that establish interresidue interactions (C45- 
U46-C47-U49, G50-A52). (B) Stereo pair of the lowest energy 27 structures superposed on heavy atoms. The loop residues are colored in red (C47), blue (U48), grey (U49), 
green (G50), and yellow (U51). (C) Ribbon representation of the SL2 structure using in the same coloring scheme as in (B). The left and right models are rotated 180° relative to 
one another about a vertical axis. Residues in the pentaloop are labeled according the SARS-CoV/MHV nucleotide sequences. 


Table 1 
NMR restraints and structural statistics. 


NMR constraints 


Total NOE distance restraints 213 
Intra-residue (i, 7) 104 
Sequential (i, i+ 1) 90 
Medium-range (2<|i — j|<4) 12 
Long-range (|i — j|>5) 7 
Hydrogen bonds 42 

Total dihedral angle restraints 95 

Structural statistics (27 structures) 

Violations 
Number of distance restraint >0.3 A 0 
Number of dihedral angle restraint >5° 0 


Rms deviation from experiments 
Distance (A) 
Dihedral angle (°) 


0.052 + 0.001 
0.100 + 0.080 


Rms deviation from idealized geometry 


Bonds (A) 0.0047 + 0.0001 

Angles (°) 0.9860 + 0.0210 

Impropers (°) 0.5840 + 0.0119 
Average pairwise RMS deviations (A) 

Backbone heavy atoms 0.33 +0.13 

All heavy atoms 0.47 + 0.18 


be anticipated on the basis of C47-G50 base pair. The pentaloop is 
clearly more dynamic than the helical stem region, but this was not 
systematically investigated further. U51 is flipped out of the stack 
between G50 and A52. There are no inter-residue interactions 
involving U51, thus revealing that U51 is solvent exposed and 
likely mobile in solution; this is consistent with the sharp line- 
widths of the H5 and H6 protons [5]. 


3.2. SL2 adopts a CUYG-like tetraloop structure 


The consensus pentaloop sequence of CoV SL2 is 5’ 
yYUUGY(U),r, (n = 0 or 1) [5] (Fig. 1D) and is therefore consistent 


with either a U-turn-like structure containing a UNR triloop 
stacked on the stem as in the VS ribozyme (Fig. 3B) or a 5’-gCUYGc 
tetraloop, the prototype member of a more diverse CNGG(N),, fam- 
ily of tetraloops (Fig. 3C). The structure of CoV SL2 reveals that the 
loop structure of SL2 adopts a CNGG(N), tetraloop topology [5]. 
Fig. 3A and C show the structures of the loop of SL2 and Smaug rec- 
ognition element (SRE), respectively, the latter of which is a mem- 
ber of CNGG(N), tetraloop family. Both pentaloops stack on the 
stem closing U-A base pair. The first and fourth residues (C47 
and G50 in SL2 and C10 and G13 in SRE) in the loop form a base 
pair in which the fourth residue adopts a high-anti 7 angle (G50 
in SL2 = ~—80° and G13 in SRE=~—60°) [22]. The second base 
(U48 in SL2 and U11 in SRE) lies in the minor groove and is stabi- 
lized by hydrophobic interactions. The third residue (U49 in SL2 
and G12 in SRE) stacks on the loop base pairing interaction be- 
tween the first and fourth residues on the opposite of the molecule. 
The fifth residue (U51 in SL2 and C14 in SRE) is flipped out from the 
stack. One difference between these two structures is the identity 
of the third loop residue in SL2 vs. SRE. The identity of this nucle- 
otide is functionally unimportant in MHV since all U49 substitu- 
tion mutants of SL2 are viable [5], a finding compatible with the 
structure. 


3.3. Structure-function correlations 


We previously reported that the MHV SL2 loop is rather highly 
functionally tolerant of base substitutions [5]. In fact, when a more 
stable SARS-CoV SL2 stem sequence replaces the native MHV SL2 
stem containing multiple A-U base pairs at the base of the stem, 
both originally characterized lethal U48C and G50C mutations in 
an all-MHV context were found to be viable [5]. We therefore 
previously suggested that SL2 plays generic structural role in 
stabilizing a higher-order structure within the 5/-UTR or a 
5’-UTR-3’-UTR complex that is important specifically for sgRNA 
synthesis. Structural and functional data suggest that the identity 
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Fig. 3. Comparison of pentaloop structures of SARS-CoV SL2 (A) (this work; PDB ID, 2L6I), VS ribozyme stem-loop (B) (PDB ID, 1TBK) [31], and Smaug recognition element 
stem-loop (C) (PDB ID, 2ES5) [22]. In panel (A), the left and right models use the same coloring scheme as Fig. 2B and are rotated 180° relative to one another about a vertical 
axis. Panels (B) and (C) use a coloring code that is analogous to that shown for SL2 in panel (A) to facilitate comparison of the pentaloop structures. 


of U51 is unimportant but may facilitate the folding of SL2 rather 
than specifically mediating a long-range RNA-RNA or RNA-protein 
interaction required for replication [5]. Interestingly, in all recov- 
ered AU51 MHV viruses, U51 was added back in; furthermore, 
extrahelical U51-like residues are often conserved in stable tetralo- 
ops, including the 5’-CNGG and 5’-YNMG-like tetraloop structures. 
These findings suggest that U51 plays a critical role in stabilizing 
the loop structure required for virus viability. 

A base pair between C47 and G50 in SL2 is consistent with the 
fact that all G50 substitution mutants were found to be lethal in 
MHV; in contrast, corresponding C47 substitutions appeared to 
have comparatively little negative impact on virus titer [5]. Fig. 4 
shows predicted secondary structures and free energy differences 
(AG) between selected SL2 C47 and U51 mutants relative to 
wild-type SL2 [27]. As can be seen, all C47 mutations potentially 
extend the helical stem by forming an additional base pair with 
U51, creating a capping YYR triloop which can be stabilizing [28]. 
The C47U mutant may incorporate a canonical U47-G50 Wobble 
pair with a wild-type-like tetraloop fold or a non-canonical U-U 
base pair (U47-U51) closing a YYR triloop as found in 16S rRNA 
[29,30]. In addition, a U51G mutation is predicted to even more 
stabilizing (Fig. 4). Taken together, these predictions partially 
explain why C47 mutations in an all MHV context did not strongly 
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Secondary 
Structure 
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Fig. 4. Prediction of secondary structures and folding free energies (AG) of MHV 
SL2 substitution mutants relative to wild-type SL2. Base pairs in the loop are 
indicated with a dotted line. Mutated residues are indicated with a dotted circle. The 
free energies shown (37 °C) were calculated using Vienna RNAfold and a constant 
free energy increment for the 5'-UUG triloop in each mutant RNA. The stabilities of 
wild-type SL2 have not yet been experimentally determined. 


negatively impact virus viability, but were absolutely co-depen- 
dent on the presence of U51 in the loop. On the other hand, our 
SL2 structure provides no clear structural rationale as to why 
U48C and U48A mutants were lethal in MHV [5]; one strong pos- 
sibility is that these mutations induce misfolding in the 5’ leader 
region, facilitated by the weaker SL2 helical stem in MHV relative 
to SARS-CoV (Fig. 1C-D). Additional structural studies of the entire 
CoV 5’-UTR will be required to substantiate this proposal. 
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